92 research outputs found
The density connectivity information bottleneck
Clustering with the agglomerative Information Bottleneck (aIB) algorithm suffers from the sub-optimality problem, which cannot guarantee to preserve as much relative information as possible. To handle this problem, we introduce a density connectivity chain, by which we consider not only the information between two data elements, but also the information among the neighbors of a data element. Based on this idea, we propose DCIB, a Density Connectivity Information Bottleneck algorithm that applies the Information Bottleneck method to quantify the relative information during the clustering procedure. As a hierarchical algorithm, the DCIB algorithm produces a pruned clustering tree-structure and gets clustering results in different sizes in a single execution. The experiment results in the documentation clustering indicate that the DCIB algorithm can preserve more relative information and achieve higher precision than the aIB algorithm.<br /
Intelligent techniques for recommender systems
This thesis focuses on the data sparsity issue and the temporal dynamic issue in the context of collaborative filtering, and addresses them with imputation techniques, low-rank subspace techniques and optimizations techniques from the machine learning perspective. A comprehensive survey on the development of collaborative filtering techniques is also included
Intensity-free Integral-based Learning of Marked Temporal Point Processes
In the marked temporal point processes (MTPP), a core problem is to
parameterize the conditional joint PDF (probability distribution function)
for inter-event time and mark , conditioned on the history.
The majority of existing studies predefine intensity functions. Their utility
is challenged by specifying the intensity function's proper form, which is
critical to balance expressiveness and processing efficiency. Recently, there
are studies moving away from predefining the intensity function -- one models
and separately, while the other focuses on temporal point
processes (TPPs), which do not consider marks. This study aims to develop
high-fidelity for discrete events where the event marks are either
categorical or numeric in a multi-dimensional continuous space. We propose a
solution framework IFIB (\underline{I}ntensity-\underline{f}ree
\underline{I}ntegral-\underline{b}ased process) that models conditional joint
PDF directly without intensity functions. It remarkably simplifies
the process to compel the essential mathematical restrictions. We show the
desired properties of IFIB and the superior experimental results of IFIB on
real-world and synthetic datasets. The code is available at
\url{https://github.com/StepinSilence/IFIB}
Mining Medical Data: Bridging the Knowledge Divide
Due to the signi¯cant amount of data generated by modern medicine there is a growing reliance on tools such as data mining and knowledge discovery to help make sense and comprehend such data. The success of this process requires collaboration and interaction between such methods and medical professionals. Therefore an important question is: How can we strengthen the relationship between two traditionally separate fields (technology and medicine) in order to work simultaneously towards enhancing knowledge in modern medicine. To address this question, this study examines the application of data mining techniques to a large asthma medical dataset. A discussion introducing various methods for a smooth approach, straying from the `jack of all trades, master of none' to a modular cooperative approach for a successful outcome is pro-posed. The results of this study support the use of data mining as a useful tool and highlight the advantages on a global scale of closer relations between the two distinct fields. The exploration of CRISP methodology suggests that a `one methodology fits all approach' is not appropriate, but rather combines to create a hybrid holistic approach to data mining
G-CREWE: Graph CompREssion With Embedding for Network Alignment
Network alignment is useful for multiple applications that require
increasingly large graphs to be processed. Existing research approaches this as
an optimization problem or computes the similarity based on node
representations. However, the process of aligning every pair of nodes between
relatively large networks is time-consuming and resource-intensive. In this
paper, we propose a framework, called G-CREWE (Graph CompREssion With
Embedding) to solve the network alignment problem. G-CREWE uses node embeddings
to align the networks on two levels of resolution, a fine resolution given by
the original network and a coarse resolution given by a compressed version, to
achieve an efficient and effective network alignment. The framework first
extracts node features and learns the node embedding via a Graph Convolutional
Network (GCN). Then, node embedding helps to guide the process of graph
compression and finally improve the alignment performance. As part of G-CREWE,
we also propose a new compression mechanism called MERGE (Minimum dEgRee
neiGhbors comprEssion) to reduce the size of the input networks while
preserving the consistency in their topological structure. Experiments on all
real networks show that our method is more than twice as fast as the most
competitive existing methods while maintaining high accuracy.Comment: 10 pages, accepted at the 29th ACM International Conference
onInformation and Knowledge Management (CIKM 20
A Radiation Viewpoint of Reconfigurable Reflectarray Elements: Performance Limit, Evaluation Criterion and Design Process
Reconfigurable reflectarray antennas (RRAs) have rapidly developed with
various prototypes proposed in recent literatures. However, designing wideband,
multiband, or high-frequency RRAs faces great challenges, especially the
lengthy simulation time due to the lack of systematic design guidance. The
current scattering viewpoint of the RRA element, which couples antenna
structures and switches during the design process, fails to address these
issues. Here, we propose a novel radiation viewpoint to model, evaluate, and
design RRA elements. Using this viewpoint, the design goal is to match the
element impedance to a characteristic impedance pre-calculated by switch
parameters, allowing various impedance matching techniques developed in
classical antennas to be applied in RRA element design. Furthermore, the
theoretical performance limit can be pre-determined at given switch parameters
before designing specific structures, and the constant loss curve is suggested
as an intuitive tool to evaluate element performance in the Smith chart. The
proposed method is validated by a practical 1-bit RRA element with degraded
switch parameters. Then, a 1-bit RRA element with wideband performance is
successfully designed using the proposed design process. The proposed method
provides a novel perspective of RRA elements, and offers a systematic and
effective guidance for designing wideband, multiband, and high-frequency RRAs.Comment: Accepted by IEEE Transactions on Antennas and Propagatio
Recommended from our members
What Will You Do for the Rest of the Day?
Understanding and predicting human mobility is vital to a large number of applications, ranging from recommendations to safety and urban service planning. In some travel applications, the ability to accurately predict the user's future trajectory is vital for delivering high quality of service. The accurate prediction of detailed trajectories would empower location-based service providers with the ability to deliver more precise recommendations to users. Existing work on human mobility prediction has mainly focused on the prediction of the next location (or the set of locations) visited by the user, rather than on the prediction of the continuous trajectory (sequences of further locations and the corresponding arrival and departure times). Furthermore, existing approaches often return predicted locations as regions with coarse granularity rather than geographical coordinates, which limits the practicality of the prediction.
In this paper, we introduce a novel trajectory prediction problem: given historical data and a user's initial trajectory in the morning, can we predict the user's full trajectory later in the day (e.g. the afternoon trajectory)? The predicted continuous trajectory includes the sequence of future locations, the stay times, and the departure times. We first conduct a comprehensive analysis about the relationship between morning trajectories and the corresponding afternoon trajectories, and found there is a positive correlation between them. Our proposed method combines similarity metrics over the extracted temporal sequences of locations to estimate similar informative segments across user trajectories.
Our evaluation shows results on both labeled and geographical trajectories with a prediction error reduced by 10-35% in comparison to the baselines. This improvement has the potential to enable precise location services, raising usefulness to users to unprecedented levels. We also present empirical evaluations with Markov model and Long Short Term Memory (LSTM), a state-of-the-art Recurrent Neural Network model. Our proposed method is shown to be more effective when smaller number of samples are used and is exponentially more efficient than LSTM.</jats:p
- …